智能论文笔记

Contrastive learning-based pretraining improves representation and transferability of diabetic retinopathy classification models

Minhaj Nur Alam , Rikiya Yamashita , Vignav Ramesh , Tejas Prabhune , Jennifer I. Lim , R. V. P. Chan , Joelle Hallak , Theodore Leng , Daniel Rubin

分类：计算机视觉

2022-08-24

基于自我监督的基于学习的预科可以使用小标签的数据集开发可靠和广义的深度学习模型，从而减轻了标签生成的负担。本文旨在评估基于CL的预处理对可转介的性能与非转介糖尿病性视网膜病（DR）分类的影响。我们已经开发了一个基于CL的框架，具有神经风格转移（NST）增强，以生成具有更好表示和初始化的模型，以检测颜色底面图像中的DR。我们将CL预估计的模型性能与用成像网权重预测的两个最先进的基线模型进行了比较。我们通过减少标记的训练数据（降至10％）进一步研究模型性能，以测试使用小标签数据集训练模型的鲁棒性。该模型在EYEPACS数据集上进行了培训和验证，并根据芝加哥伊利诺伊大学（UIC）的临床数据进行了独立测试。与基线模型相比，我们的CL预处理的基础网模型具有更高的AUC（CI）值（0.91（0.898至0.930），在UIC数据上为0.80（0.783至0.820）和0.83（0.783至0.820）（0.801至0.853）。在10％标记的培训数据时，在UIC数据集上测试时，基线模型中的FoldusNet AUC为0.81（0.78至0.84），比0.58（0.56至0.64）和0.63（0.56至0.64）和0.63（0.60至0.66）。基于CL的NST预处理可显着提高DL分类性能，帮助模型良好（可从Eyepacs转移到UIC数据），并允许使用小的带注释的数据集进行培训，从而减少临床医生的地面真相注释负担。

translated by 谷歌翻译

Bengali Handwritten Digit Recognition using CNN with Explainable AI

Md Tanvir Rouf Shawon , Raihan Tanvir , Md. Golam Rabiul Alam

分类：计算机视觉 | 机器学习

2022-12-23

Handwritten character recognition is a hot topic for research nowadays. If we can convert a handwritten piece of paper into a text-searchable document using the Optical Character Recognition (OCR) technique, we can easily understand the content and do not need to read the handwritten document. OCR in the English language is very common, but in the Bengali language, it is very hard to find a good quality OCR application. If we can merge machine learning and deep learning with OCR, it could be a huge contribution to this field. Various researchers have proposed a number of strategies for recognizing Bengali handwritten characters. A lot of ML algorithms and deep neural networks were used in their work, but the explanations of their models are not available. In our work, we have used various machine learning algorithms and CNN to recognize handwritten Bengali digits. We have got acceptable accuracy from some ML models, and CNN has given us great testing accuracy. Grad-CAM was used as an XAI method on our CNN model, which gave us insights into the model and helped us detect the origin of interest for recognizing a digit from an image.

translated by 谷歌翻译

Brain Tumor Synthetic Data Generation with Adaptive StyleGANs

Usama Tariq , Rizwan Qureshi , Anas Zafar , Danyal Aftab , Jia Wu , Tanvir Alam , Zubair Shah , Hazrat Ali

分类：计算机视觉 | 机器学习

2022-12-04

Generative models have been very successful over the years and have received significant attention for synthetic data generation. As deep learning models are getting more and more complex, they require large amounts of data to perform accurately. In medical image analysis, such generative models play a crucial role as the available data is limited due to challenges related to data privacy, lack of data diversity, or uneven data distributions. In this paper, we present a method to generate brain tumor MRI images using generative adversarial networks. We have utilized StyleGAN2 with ADA methodology to generate high-quality brain MRI with tumors while using a significantly smaller amount of training data when compared to the existing approaches. We use three pre-trained models for transfer learning. Results demonstrate that the proposed method can learn the distributions of brain tumors. Furthermore, the model can generate high-quality synthetic brain MRI with a tumor that can limit the small sample size issues. The approach can addresses the limited data availability by generating realistic-looking brain MRI with tumors. The code is available at: ~\url{https://github.com/rizwanqureshi123/Brain-Tumor-Synthetic-Data}.

translated by 谷歌翻译

FedRolex: Model-Heterogeneous Federated Learning with Rolling Sub-Model Extraction

Samiul Alam , Luyang Liu , Ming Yan , Mi Zhang

分类：机器学习 | 计算机视觉

2022-12-03

Most cross-device federated learning (FL) studies focus on the model-homogeneous setting where the global server model and local client models are identical. However, such constraint not only excludes low-end clients who would otherwise make unique contributions to model training but also restrains clients from training large models due to on-device resource bottlenecks. In this work, we propose FedRolex, a partial training (PT)-based approach that enables model-heterogeneous FL and can train a global server model larger than the largest client model. At its core, FedRolex employs a rolling sub-model extraction scheme that allows different parts of the global server model to be evenly trained, which mitigates the client drift induced by the inconsistency between individual client models and server model architectures. We show that FedRolex outperforms state-of-the-art PT-based model-heterogeneous FL methods (e.g. Federated Dropout) and reduces the gap between model-heterogeneous and model-homogeneous FL, especially under the large-model large-dataset regime. In addition, we provide theoretical statistical analysis on its advantage over Federated Dropout and evaluate FedRolex on an emulated real-world device distribution to show that FedRolex can enhance the inclusiveness of FL and boost the performance of low-end devices that would otherwise not benefit from FL. Our code is available at https://github.com/MSU-MLSys-Lab/FedRolex.

translated by 谷歌翻译

SafeSpace MFNet: Precise and Efficient MultiFeature Drone Detection Network

Mahnoor Dil , Misha Urooj Khan , Muhammad Zeshan Alam , Farooq Alam Orakazi , Zeeshan Kaleem , Chau Yuen

分类：计算机视觉

2022-11-30

Unmanned air vehicles (UAVs) popularity is on the rise as it enables the services like traffic monitoring, emergency communications, deliveries, and surveillance. However, the unauthorized usage of UAVs (a.k.a drone) may violate security and privacy protocols for security-sensitive national and international institutions. The presented challenges require fast, efficient, and precise detection of UAVs irrespective of harsh weather conditions, the presence of different objects, and their size to enable SafeSpace. Recently, there has been significant progress in using the latest deep learning models, but those models have shortcomings in terms of computational complexity, precision, and non-scalability. To overcome these limitations, we propose a precise and efficient multiscale and multifeature UAV detection network for SafeSpace, i.e., \textit{MultiFeatureNet} (\textit{MFNet}), an improved version of the popular object detection algorithm YOLOv5s. In \textit{MFNet}, we perform multiple changes in the backbone and neck of the YOLOv5s network to focus on the various small and ignored features required for accurate and fast UAV detection. To further improve the accuracy and focus on the specific situation and multiscale UAVs, we classify the \textit{MFNet} into small (S), medium (M), and large (L): these are the combinations of various size filters in the convolution and the bottleneckCSP layers, reside in the backbone and neck of the architecture. This classification helps to overcome the computational cost by training the model on a specific feature map rather than all the features. The dataset and code are available as an open source: github.com/ZeeshanKaleem/MultiFeatureNet.

translated by 谷歌翻译

A Boosting Approach to Constructing an Ensemble Stack

Zhilei Zhou , Ziyu Qiu , Brad Niblett , Andrew Johnston , Jeffrey Schwartzentruber , Nur Zincir-Heywood , Malcolm Heywood

分类：神经与进化计算

2022-11-28

An approach to evolutionary ensemble learning for classification is proposed in which boosting is used to construct a stack of programs. Each application of boosting identifies a single champion and a residual dataset, i.e. the training records that thus far were not correctly classified. The next program is only trained against the residual, with the process iterating until some maximum ensemble size or no further residual remains. Training against a residual dataset actively reduces the cost of training. Deploying the ensemble as a stack also means that only one classifier might be necessary to make a prediction, so improving interpretability. Benchmarking studies are conducted to illustrate competitiveness with the prediction accuracy of current state-of-the-art evolutionary ensemble learning algorithms, while providing solutions that are orders of magnitude simpler. Further benchmarking with a high cardinality dataset indicates that the proposed method is also more accurate and efficient than XGBoost.

translated by 谷歌翻译

Automated Detection of Dolphin Whistles with Convolutional Networks and Transfer Learning

Burla Nur Korkmaz , Roee Diamant , Gil Danino , Alberto Testolin

分类：计算机视觉 | 机器学习

2022-11-28

Effective conservation of maritime environments and wildlife management of endangered species require the implementation of efficient, accurate and scalable solutions for environmental monitoring. Ecoacoustics offers the advantages of non-invasive, long-duration sampling of environmental sounds and has the potential to become the reference tool for biodiversity surveying. However, the analysis and interpretation of acoustic data is a time-consuming process that often requires a great amount of human supervision. This issue might be tackled by exploiting modern techniques for automatic audio signal analysis, which have recently achieved impressive performance thanks to the advances in deep learning research. In this paper we show that convolutional neural networks can indeed significantly outperform traditional automatic methods in a challenging detection task: identification of dolphin whistles from underwater audio recordings. The proposed system can detect signals even in the presence of ambient noise, at the same time consistently reducing the likelihood of producing false positives and false negatives. Our results further support the adoption of artificial intelligence technology to improve the automatic monitoring of marine ecosystems.

translated by 谷歌翻译

MAIL: Malware Analysis Intermediate Language

Shahid Alam

分类：自然语言处理

2022-11-06

This paper introduces and presents a new language named MAIL (Malware Analysis Intermediate Language). MAIL is basically used for building malware analysis and detection tools. MAIL provides an abstract representation of an assembly program and hence the ability of a tool to automate malware analysis and detection. By translating binaries compiled for different platforms to MAIL, a tool can achieve platform independence. Each MAIL statement is annotated with patterns that can be used by a tool to optimize malware analysis and detection.

translated by 谷歌翻译

From Play to Policy: Conditional Behavior Generation from Uncurated Robot Data

Zichen Jeff Cui , Yibin Wang , Nur Muhammad Mahi Shafiullah , Lerrel Pinto

分类：机器人 | 人工智能 | 计算机视觉 | 机器学习

2022-10-18

While large-scale sequence modeling from offline data has led to impressive performance gains in natural language and image generation, directly translating such ideas to robotics has been challenging. One critical reason for this is that uncurated robot demonstration data, i.e. play data, collected from non-expert human demonstrators are often noisy, diverse, and distributionally multi-modal. This makes extracting useful, task-centric behaviors from such data a difficult generative modeling problem. In this work, we present Conditional Behavior Transformers (C-BeT), a method that combines the multi-modal generation ability of Behavior Transformer with future-conditioned goal specification. On a suite of simulated benchmark tasks, we find that C-BeT improves upon prior state-of-the-art work in learning from play data by an average of 45.7%. Further, we demonstrate for the first time that useful task-centric behaviors can be learned on a real-world robot purely from play data without any task labels or reward information. Robot videos are best viewed on our project website: https://play-to-policy.github.io

translated by 谷歌翻译

BanglaSarc: A Dataset for Sarcasm Detection

Tasnim Sakib Apon , Ramisa Anan , Elizabeth Antora Modhu , Arjun Suter , Ifrit Jamal Sneha , MD. Golam Rabiul Alam

分类：自然语言处理 | 人工智能

2022-09-27

作为世界上口语最广泛的语言之一，孟加拉国的使用在社交媒体世界中也在增加。讽刺是一种积极的陈述或言论，其基本的负面动机在当今的社交媒体平台中广泛使用。在过去的许多年中，英语的讽刺检测有了显着改善，但是有关孟加拉讽刺检测的情况仍然没有改变。结果，仍然很难识别孟加拉国中的讽刺，缺乏高质量的数据是主要因素。本文提出了Banglasarc，该数据集是专门为孟加拉文本数据讽刺检测的数据集。该数据集包含5112条评论/状态和从各种在线社交平台（例如Facebook，YouTube）以及一些在线博客中收集的内容。由于孟加拉语中分类评论的数据收集数量有限，因此该数据集将有助于确定讽刺的研究，认识到人们的情绪，检测到各种类型的孟加拉语表达式和其他领域。该数据集可在https://www.kaggle.com/datasets/sakibapon/banglasarc上公开获得。

translated by 谷歌翻译